50 research outputs found

    A methodology to implement real-time applications on reconfigurable circuits

    Get PDF
    Special Issue Engineering of Configurable SystemsInternational audienceThis paper presents an extension of our AAA rapid prototyping methodology for the optimized implementation ofreal-time applications onto reconfigurable circuits. This extension is based on an unified model of factorized datadependence graphs as well to specify the application algorihtm, as to deduce the possible implementations ontoreconfigurable hardware, in terms of graphs transformations. This transformation flow has been implemented inSynDEx, a system level CAD software tool

    Modèle d'exécutif distribué temps réel pour SynDEx

    Get PDF
    Projet SOSSOCe document s'adresse aux concepteurs d'applications distribuées temps réel embarquées, qui désirent optimiser l'implantation de leurs algorithmes de commande et de traitement du signal et des images sur des architectures multiprocesseurs. Il s'adresse ensuite plus particulièrement aux utilisateurs du logiciel SynDEx v4 de CAO niveau système, qui supporte la méthodologie «Adéquation Algorithme Architecture», développée pour améliorer leur productivité. Le but de ce document est d'abord de permettre au public visé de comprendre les tenants et aboutissants de la méthodologie et de ses modèles, et plus particulièrement les exécutifs distribués générés par le logiciel SynDEx, optimisés pour le temps réel et pour les architectures embarquées multiprocesseur. Son but est ensuite de permettre à un public averti de porter le jeu de macros qui constituent le «noyau générique d'exécutif SynDEx v4», pour obtenir un générateur d'exécutif spécifique à un nouveau type de processeur

    Optimizations for real-time implementation of H264/AVC video encoder on DSP processor

    Get PDF
    International audienceReal-time H.264/AVC high definition video encoding represents a challenging workload to most existing programmable processors. The new technologies of programmable processors such as Graphic Processor Unit (GPU) and multicore Digital signal Processor (DSP) offer a very promising solution to overcome these constraints. In this paper, an optimized implementation of H264/AVC video encoder on a single core among the six cores of TMS320C6472 DSP for Common Intermediate Format (CIF) (352x288) resolution is presented in order to move afterwards to a multicore implementation for standard and high definitions (SD,HD).Algorithmic optimization is applied to the intra prediction module to reduce the computational time. Furthermore, based on the DSP architectural features, various structural and hardware optimizations are adopted to minimize external memory access. The parallelism between CPU processing and data transfers is fully exploited using an Enhanced Direct Memory Access controller (EDMA). Experimental results show that the whole proposed optimizations, on a single core running at 700 MHz for CIF resolution, improve the encoding speed by up to 42.91%. They allow reaching the real-time encoding 25 f/s without inducing any Peak Signal to Noise Ratio (PSNR) degradation or bit-rate increase and make possible to achieve real time implementation for SD and HD resolutions when exploiting multicore features

    Implantation optimisée sur circuit dédié d'algorithmes spécifiés sous la forme d'un Graphe Factorisé de Dépendances de Données : application aux traitements d'images

    Get PDF
    On présente un flot de prototypage rapide pour l'implantation optimisée d'applications temps réel sur des circuits dédiés. Ce flot est basée sur un modèle unifié de graphes factorisés de dépendances de données autant pour spécifier l'algorithme que pour en déduire les implantaions possibles en termes de transformations de graphes

    From Algorithm and Architecture Specifications to Automatic Generation of Distributed Real-Time Executives: a Seamless Flow of Graphs Transformations

    Get PDF
    International audienceThis paper presents a seamless flow of transformations which performs dedicated distributed executive generation from a high level specification of a pair: algorithm, architecture. This work is based upon graph models and graph transformations and is part of the AAA methodology. We present an original architecture model which allows to perform accurate sequencer modeling, memory allocation , and heterogeneous inter-processor communications for both modes shared memory and message passing. Then we present the flow of transformations that leads to the automatic generation of dedicated real-time distributed executives which are deadlock free. This transformation flow has been implemented in a system level CAD software tool called SynDEx

    MODELISATION D'ARCHITECTURES PARALLELES HETEROGENES POUR LA GENERATION AUTOMATIQUE D'EXECUTIFS DISTRIBUES TEMPS REEL OPTIMISES

    No full text
    L'IMPLANTATION OPTIMISEE D'ALGORITHMES REACTIFS SUR DES ARCHITECTURES PARALLELES, EST UN PROBLEME COMPLEXE QUE NOUS NOUS PROPOSONS DE RESOUDRE PAR LA METHODOLOGIE ADEQUATION ALGORITHME ARCHITECTURE. ELLE PERMET DE CONSTRUIRE RAPIDEMENT, A PARTIR DE LA SPECIFICATION D'UN ALGORITHME ET D'UNE ARCHITECTURE, LE PROGRAMME OPTIMISE DE CHAQUE PROCESSEUR. CHAQUE PROGRAMME EST CONSTITUE D'UNE PARTIE APPLICATIVE QUI CORRESPOND A L'ALGORITHME ET D'UNE PARTIE EXECUTIF. CET EXECUTIF NE PEUT ETRE CONSTRUIT SANS UN MODELE PRECIS DE L'ARCHITECTURE, DE L'ALGORITHME ET DE LEUR MISE EN ADEQUATION (IMPLANTATION OPTIMISEE). NOUS PROPOSONS UN NOUVEAU MODELE D'ARCHITECTURE SUFFISAMMENT GENERIQUE POUR MODELISER LE PLUS GRAND NOMBRE POSSIBLE D'ARCHITECTURES, ET BIEN ADAPTE A L'OPTIMISATION ET LA GENERATION AUTOMATIQUE D'EXECUTIFS. UNE MACHINE EST MODELISEE PAR UN GRAPHE ORIENTE OU LES SOMMETS CORRESPONDENT A SES SEQUENCEURS DE CALCULS ET DE COMMUNICATIONS, SES BUS, SES MEMOIRES. L'ALGORITHME EST MODELISE PAR UN HYPERGRAPHE ORIENTE OU LES SOMMETS SONT DES OPERATIONS DE CALCUL ET LES ARCS LES DEPENDANCES DE DONNEES ENTRE CES OPERATIONS. PAR TRANSFORMATION DE CES DEUX GRAPHES NOUS CONSTRUISONS L'ENSEMBLE DES GRAPHES D'IMPLANTATION. CHACUN D'EUX DECRIT LA DISTRIBUTION ET L'ORDONNANCEMENT DES CALCULS ET DES COMMUNICATIONS. NOUS CHOISISSONS PARMI CES IMPLANTATIONS, A L'AIDE D'UNE HEURISTIQUE D'OPTIMISATION, CELLE QUI POSSEDE LA PLUS COURTE DUREE D'EXECUTION ET MINIMISE LES MEMOIRES. ENSUITE, NOUS FORMALISONS LES TRANSFORMATIONS DU GRAPHE D'IMPLANTATION OPTIMISE EN UN EXECUTIF DISTRIBUE OPTIMISE, REPOSANT SUR UN MACRO-CODE INTERMEDIAIRE POUR SUPPORTER LE PLUS D'ARCHITECTURES POSSIBLES. CET EXECUTIF EST TAILLE SUR MESURE POUR L'APPLICATION AFIN DE MINIMISER SON SURCOUT ET GARANTIR L'ORDONNANCEMENT PRECIS DES CALCULS ET DES COMMUNICATIONS. ENFIN, NOUS PRESENTONS LE DEVELOPPEMENT DU LOGICIEL SYNDEX SUPPORTANT LA METHODOLOGIE PRESENTEE, AINSI QUE SON UTILISATION DANS DEUX APPLICATIONS CONCRETES.ORSAY-PARIS 11-BU Sciences (914712101) / SudocSudocFranceF

    A rapid prototyping methodology to implement and optimizing image processing algorithms for FPGAs

    No full text
    electronic version (10 pp.)International audienc

    A New Modelling Framework for Coarse-Grained Programmable Architectures

    No full text
    accepted for 2020, presentation delayd to 2021International audienceCoarse-grained reconfigurable architectures (CGRA) are designed to deliver high-performance computing while drastically reducing the latency of the computing system. Although they are often highly domain-specifically optimized, they keep several levels of flexibility so that they can be reused. However, their reuse is generally limited due to the complexity of identifying the best allocation of new tasks into the hardware resources. Another limiting point is the complexity to produce a reliable performance analysis for each new implementation. To solve this problem, we propose to consider CGRA as a programmable, configuration-driven computing fabric, called Coarse-Grained Programmable Architecture (CGPA). We propose a new latency-based model to describe all hardware elements. We demonstrate how to implicitly model, with the help of latency's prediction, the heterogeneity of their material implementations. Our model provides the possibility to assess also the configuration cost, often neglected in other works. The design of the modelling framework allows it to become a part of a complete application mapping and scheduling chain, up to the automated generation of the execution context, thus maximizing the reusability of the given CGPA

    Learning System for Defactorization Factor Classification of Factorized Data Dependence Graph

    No full text
    International audienceIn the presence of a concrete problem of multi-objective optimization, we are confronting with the principal difficulty to choice a method producing the optimal solutions. This choice implies the knowledge and the expertise of the user. In this framework, we are interested to the flow of design based on methodology AAA (Adequacy Algorithm Architecture). The extension of this methodology to the circuits allows the exploitation of potential parallelism onto components. It aims to obtaining a real time implementation witch respect the temporal constraint of the application while minimizing the resources. Then, from an algorithm specified with a data flow graph, this exploration of parallelism is NP-complete problem. In this work, we propose a new solution to perform a multi-objective exploration by integrating an SVM (Support Vector Machine) training aptitude to an agent. We validate our model by a simulation based on the greedy heuristic results of SynDEX-IC (Synchronized Distributed Executive for Integrated circuit) [1] examples
    corecore